📦 socr

A prototype R package for modeling
tracking and event data in soccer

Why?

  • No public R packages for modeling tracking and event data in soccer1
  • Public soccer analytics is typically model-specific
  • Value in a “general” framework for doing common modeling tasks

Other reasons…

  • Wanted hands-on experience in sports analytics
  • My career: academia (social science, networks, causal inference) to data science

Package design

Design principles

  • Do the least possible:
    • Metrics and models
    • No plot functions; leave data import/API/standardization to other packages
  • Extendable, not tied to a specific model
  • Minimal dependencies: data.table, vctrs, rlang

Design abstraction

flowchart LR
  A[DB]:::external --> B[socr]
  B:::socr --> C("as_event() \n DT")
  B:::socr --> D("as_tracking() \n DT")
  C:::function --> E("as_action() \n rcrd")
  D:::function --> F("as_position() \n rcrd")
  E:::function --> G("metrics(a, p)")
  F:::function --> G("metrics(a, p)")
  G:::metrics --> H("models(trk, evt)"):::models
  
  classDef external fill:#eee,stroke:#36454F,font-family:Fira Sans;
  classDef socr font-family:Fira Sans;
  classDef function fill:#f96,font-family:Fira Code,font-size:0.85em;
  classDef metrics fill:#48C9B0,font-family:Fira Code,font-size:0.85em;
  classDef models fill:#48A1C9,font-family:Fira Code,font-size:0.85em;

\(~\)

Coercion

  • simplification
  • method dispatch
    • getters

Metrics

  • building blocks
  • consistent representation

Models

  • interpretable output
  • internal coercion

Coercion, classes, and rcrds

Tracking data

library(socr)

Tracking data

library(socr)

data(metrica_tracking)
head(metrica_tracking) 
  period time   entity      x       y team
1      1 0.04 player11 0.0984 38.5904 home
2      1 0.08 player11 0.1152 38.5904 home
3      1 0.12 player11 0.1368 38.5904 home
4      1 0.16 player11 0.1452 38.5904 home
5      1 0.20 player11 0.1548 38.5904 home
6      1 0.24 player11 0.1680 38.5904 home

Coercion to tracking object

trk <- as_tracking(metrica_tracking)

Coercion to tracking object

trk <- as_tracking(metrica_tracking)
print(trk)
Key: <entity, team, time>
         period   team  entity    time       x       y
          <int> <char>  <char>   <num>   <num>   <num>
      1:      1   ball    ball    0.04 54.5664 30.9672
      2:      1   ball    ball    0.08 59.5740 32.5248
      3:      1   ball    ball    0.12 64.4592 34.0448
      4:      1   ball    ball    0.16 66.4152 33.7848
      5:      1   ball    ball    0.20 66.6144 32.4560
     ---                                              
1639161:      1   home player9 2850.56 58.3908 84.0000
1639162:      1   home player9 2850.60 58.3908 84.0000
1639163:      1   home player9 2850.64 58.3908 84.0000
1639164:      1   home player9 2850.68 58.3908 84.0000
1639165:      1   home player9 2850.72 58.3908 84.0000

Coercion to tracking object

trk <- as_tracking(metrica_tracking)
print(trk)
Key: <entity, team, time>
         period   team  entity    time       x       y
          <int> <char>  <char>   <num>   <num>   <num>
      1:      1   ball    ball    0.04 54.5664 30.9672
      2:      1   ball    ball    0.08 59.5740 32.5248
      3:      1   ball    ball    0.12 64.4592 34.0448
      4:      1   ball    ball    0.16 66.4152 33.7848
      5:      1   ball    ball    0.20 66.6144 32.4560
     ---                                              
1639161:      1   home player9 2850.56 58.3908 84.0000
1639162:      1   home player9 2850.60 58.3908 84.0000
1639163:      1   home player9 2850.64 58.3908 84.0000
1639164:      1   home player9 2850.68 58.3908 84.0000
1639165:      1   home player9 2850.72 58.3908 84.0000

\(~\)

class(trk)
[1] "tracking"   "data.table" "data.frame"

Tracking object validation

# socr:::validate_tracking()

validate_tracking <- function(x) {
  # check the object
}

Tracking object validation

# socr:::validate_tracking()

validate_tracking <- function(x) {
  vec_assert(field(x, "period"), integer())
  vec_assert(field(x, "entity"), character())
  vec_assert(field(x, "team"),   character())
  vec_assert(field(x, "time"),   double())
  vec_assert(field(x, "x"),      double())
  vec_assert(field(x, "y"),      double())
}

Tracking object validation

# socr:::validate_tracking()

validate_tracking <- function(x) {
  vec_assert(field(x, "period"), integer())
  vec_assert(field(x, "entity"), character())
  vec_assert(field(x, "team"),   character())
  vec_assert(field(x, "time"),   double())
  vec_assert(field(x, "x"),      double())
  vec_assert(field(x, "y"),      double())
  
  if (!identical(key(x), c("entity", "team", "time"))) {
    setkeyv(x, cols = c("entity", "team", "time"))
  }
  
  x
}

Coercing tracking to position

A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length1.

p <- as_position(trk)

Coercing tracking to position

A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length.

p <- as_position(trk)
class(p)
[1] "position"   "vctrs_rcrd" "vctrs_vctr"

Coercing tracking to position

A position rcrd is a minimal representation of tracking data. It has five fields which must be vectors of the same length.

p <- as_position(trk)
class(p)
[1] "position"   "vctrs_rcrd" "vctrs_vctr"

\(~\)

fields(p)
[1] "entity" "team"   "time"   "x"      "y"     

\(~\)

print(p)
<position[1639165]>
t: 0.04 e: ball <ball> xy: 54.6, 31.0
t: 0.08 e: ball <ball> xy: 59.6, 32.5
t: 0.12 e: ball <ball> xy: 64.5, 34.0
t: 0.16 e: ball <ball> xy: 66.4, 33.8
t: 0.20 e: ball <ball> xy: 66.6, 32.5
t: 0.24 e: ball <ball> xy: 66.8, 31.1
t: 0.28 e: ball <ball> xy: 67.0, 29.8
t: 0.32 e: ball <ball> xy: 67.2, 28.5
t: 0.36 e: ball <ball> xy: 67.4, 27.1
t: 0.40 e: ball <ball> xy: 67.6, 25.8
... with 1639155 more positions

Position constructor

Optionally, construct a position from scratch.

p_constructed <- 
  position(
    entity = "Kerr", # vector recycling
    team   = "Chelsea",
    time   = 10:13,
    x      = c(20, 21, 23, 26),
    y      = c(50, 48, 47, 45)
  )

print(p_constructed)
<position[4]>
t: 10.00 e: Kerr <Chelsea> xy: 20.0, 50.0
t: 11.00 e: Kerr <Chelsea> xy: 21.0, 48.0
t: 12.00 e: Kerr <Chelsea> xy: 23.0, 47.0
t: 13.00 e: Kerr <Chelsea> xy: 26.0, 45.0

Getters

p_constructed
<position[4]>
t: 10.00 e: Kerr <Chelsea> xy: 20.0, 50.0
t: 11.00 e: Kerr <Chelsea> xy: 21.0, 48.0
t: 12.00 e: Kerr <Chelsea> xy: 23.0, 47.0
t: 13.00 e: Kerr <Chelsea> xy: 26.0, 45.0
get_time(p_constructed)
[1] 10 11 12 13


get_entity(p_constructed)
[1] "Kerr" "Kerr" "Kerr" "Kerr"


get_team(p_constructed)
[1] "Chelsea" "Chelsea" "Chelsea" "Chelsea"


get_location(p_constructed)
      x  y
Kerr 20 50
Kerr 21 48
Kerr 23 47
Kerr 26 45

Metrics

Metrics principles

All metrics are calculated from position or action objects


Dimensions Representation Examples
1 vector
  • speed_ratio()

  • theta()

2 matrix
  • expected_position()
≥ 3 array
  • displacement()

  • velocity()

Displacement

By default, the displacement metric is calculated within-entity

print(p, n = 5)
<position[1639165]>
t: 0.04 e: ball <ball> xy: 54.6, 31.0
t: 0.08 e: ball <ball> xy: 59.6, 32.5
t: 0.12 e: ball <ball> xy: 64.5, 34.0
t: 0.16 e: ball <ball> xy: 66.4, 33.8
t: 0.20 e: ball <ball> xy: 66.6, 32.5
... with 1639160 more positions


displacement(p)[1:5, ]
          x       y       xy
ball     NA      NA       NA
ball 5.0076  1.5576 5.244252
ball 4.8852  1.5200 5.116207
ball 1.9560 -0.2600 1.973205
ball 0.1992 -1.3288 1.343648

Displacement

By providing from and/or to, displacement is calculated between-entity

d <- displacement(p, from = "player16", to = "player19")
# d[from, to, time, metric]


d["player16", "player19", "3", "x"]
[1] 20.9676

Displacement

By providing from and/or to, displacement is calculated between-entity

d <- displacement(p, from = "player16", to = "player19")
# d[from, to, time, metric]


d["player16", "player19", "3", "x"]
[1] 20.9676


d["player16", , , ]
        x      y       xy
1 26.0976 7.9640 27.28571
2 24.2400 8.7288 25.76373
3 20.9676 6.8544 22.05953
4 18.7344 4.5152 19.27083


d[, , , "x"]
      1       2       3       4 
26.0976 24.2400 20.9676 18.7344 


d[, , 3, ]
       x        y       xy 
20.96760  6.85440 22.05953 

Displacement

Providing multiple values in from and to returns an array with dimensions from \(\times\) to \(\times\) time \(\times\) 3 (x, y, z)

home_players <- vec_unique(get_entity(p, team = "home"))
d <- displacement(p, from = home_players, to = home_players)
round(d[, , 1, "x"], 2)
         player1 player10 player11 player2 player3 player4 player5 player6
player1     0.00   -26.84    38.81   -1.06    1.08   -0.02  -10.73  -10.37
player10   26.84     0.00    65.64   25.78   27.92   26.82   16.11   16.47
player11  -38.81   -65.64     0.00  -39.87  -37.72  -38.83  -49.53  -49.17
player2     1.06   -25.78    39.87    0.00    2.15    1.04   -9.66   -9.30
player3    -1.08   -27.92    37.72   -2.15    0.00   -1.11  -11.81  -11.45
player4     0.02   -26.82    38.83   -1.04    1.11    0.00  -10.70  -10.34
player5    10.73   -16.11    49.53    9.66   11.81   10.70    0.00    0.36
player6    10.37   -16.47    49.17    9.30   11.45   10.34   -0.36    0.00
player7     8.85   -17.99    47.66    7.79    9.94    8.83   -1.87   -1.51
player8    14.90   -11.94    53.70   13.84   15.98   14.88    4.17    4.53
player9    24.41    -2.43    63.22   23.35   25.50   24.39   13.69   14.05
         player7 player8 player9
player1    -8.85  -14.90  -24.41
player10   17.99   11.94    2.43
player11  -47.66  -53.70  -63.22
player2    -7.79  -13.84  -23.35
player3    -9.94  -15.98  -25.50
player4    -8.83  -14.88  -24.39
player5     1.87   -4.17  -13.69
player6     1.51   -4.53  -14.05
player7     0.00   -6.05  -15.56
player8     6.05    0.00   -9.51
player9    15.56    9.51    0.00

Method dispatch

When desired, metric functions can operate on position or action objects, handled via dispatch

velocity <- function(z, ...) {
  UseMethod("get_location")
}

velocity.position <- function(z, ...) {
  
  if (!is_position(z)) {
    abort()
  }

  do_this(x, time)
  
}

velocity.action <- function(z, ...) {
  
  if (!is_action(z)) {
    abort()
  }
  
  do_that(x_start, x_end, time)
  
}

Models

Model principles

  • All models take tracking and/or event objects
  • Models coerce to position and action objects internally, with object validation
  • Required metrics are calculated from position and action objects
  • Models should return interpretable data.frame / data.table output

Pitch control

pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {

  validate_tracking(tracking)
  players <- as_position(tracking, ball = FALSE, expand = TRUE)
  
}

Pitch control

pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {

  validate_tracking(tracking)
  players <- as_position(tracking, ball = FALSE, expand = TRUE)
  
  # metrics
  lo <- get_location(players)
  mu <- expected_position(players)
  th <- theta(players)
  sr <- speed_ratio(players)
  ir <- influence_radius(as_position(tracking, expand = TRUE))
  
}

Pitch control

pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {

  ...
  
  # rotation and scaling matrices
  rotation <- lapply(
    th,
    function(x) matrix(c(cos(x), sin(x), -sin(x), cos(x)), nrow = 2)
  )

  scaling <- purrr::pmap(
    list(as.numeric((1 + sr) * ir / 2),
         as.numeric((1 - sr) * ir / 2)),
    function(x, y) matrix(c(x, 0, 0, y), nrow = 2)
  )

  # sigma
  sigma <- purrr::pmap(
    list(rotation, scaling),
    function(x, y) x %*% y %*% y %*% solve(x)
  )
  
}

Pitch control

pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {

  ...
  
  # influence by player for each pitch cell
  pitch_influence <- purrr::pmap(
    list(mu, sigma),
    function(x, y) mvtnorm::dmvnorm(pitch, x, y)
  )

  # influence of player at location
  location_influence <- purrr::pmap_dbl(
    list(lo, mu, sigma),
    function(x, y, z) mvtnorm::dmvnorm(x, y, z)
  )

  influence <- do.call(cbind, pitch_influence) %*%
    Matrix::Diagonal(x = 1 / location_influence)
  
}

Pitch control

pitch_control <- function(tracking, grid = c(120, 80), cells = 200) {
  
  ...
  
  # multiply by -1 for summation convenience
  influence[, away_team] <- influence[, away_team] * -1
  
  summation <- rowsum(x = t(as.matrix(influence)),
                      group = get_time(players),
                      na.rm = TRUE)
  
  pc <- 1 / (1 + exp(summation))
  
  # return as data.table
  
}

Speed and memory

trk_short <- trk[time %in% seq(834, 857, by = 0.2)]
bench::mark(
  pc <- pitch_control(trk_short, grid = c(120, 80), cells = 200),
  max_iterations = 1
)$median
[1] 7.92s
pc
          time     x          y control
         <num> <num>      <num>   <num>
      1: 834.2     0  0.0000000     0.5
      2: 834.2     0  0.4020101     0.5
      3: 834.2     0  0.8040201     0.5
      4: 834.2     0  1.2060302     0.5
      5: 834.2     0  1.6080402     0.5
     ---                               
4599996: 857.0   120 78.3919598     0.5
4599997: 857.0   120 78.7939698     0.5
4599998: 857.0   120 79.1959799     0.5
4599999: 857.0   120 79.5979899     0.5
4600000: 857.0   120 80.0000000     0.5

Pitch control

Implementation of Fernández and Bornn (2018), Wide Open Spaces, MIT SSAC

Next steps

Limitations and challenges

  • Scaling models like pitch_control() to multiple matches

  • Finding a good minimal representation of event data for as_action()

  • Managing dependencies for machine learning (tidymodels, tensorflow, torch)

End

Thanks!